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One of the most basic and general tasks faced by all nervous systems is extracting 
relevant information from the organism's surrounding world. While physical signals 
available to sensory systems are often continuous, variable, overlapping, and noisy, 
high-level neuronal representations used for decision-making tend to be discrete, specific, 
invariant, and highly separable. This study addresses the question of how neuronal 
specificity is generated. Inspired by experimental findings on network architecture in 
the olfactory system of the locust, I construct a highly simplified theoretical framework 
which allows for analytic solution of its key properties. For generalized feed-forward 
systems, I show that an intermediate range of connectivity values between source- and 
target-populations leads to a combinatorial explosion of wiring possibilities, resulting in 
input spaces which are, by their very nature, exquisitely sparsely populated. In particular, 
connection probability V2, as found in the locust antennal-lobe-mushroom-body circuit, 
serves to maximize separation of neuronal representations across the target Kenyon 
cells (KCs), and explains their specific and reliable responses. This analysis yields a 
function expressing response specificity in terms of lower network parameters; together 
with appropriate gain control this leads to a simple neuronal algorithm for generating 
arbitrarily sparse and selective codes and linking network architecture and neural coding. 
I suggest a straightforward way to construct ecologically meaningful representations from 
this code. 
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INTRODUCTION 

Animals all use information about their surrounding world in 
order to function within it. Nervous systems have specialized 
in gathering, processing, storing, and retrieving such informa- 
tion and in using it to make decisions necessary for survival. To 
accomplish these tasks, the brain must disregard much of the 
information made available by the senses, extracting only what 
is relevant for the animal's needs. Just as in drawing a map of a 
newly discovered land, the brain, in so doing, creates a schematic 
internal representation of the animal's world — and it is over this 
internal model that generalizations are drawn, categories are dis- 
cerned, associations made, and behavior triggered [Marr, 1970, 
1971; Barlow, 1985; von der Malsburg, 1986, 1990; Kanerva, 1988; 
Foldiak, 1990; reviewed in deCharms and Zador (2000)]. 

By virtue of the choice of what to keep in it, this internal neu- 
ronal representation is tailored to the organism's needs; and just 
as a historian, geologist, and meteorologist would each draw a dif- 
ferent map of the same piece of land, it too suggests alternate ways 
of viewing and interpreting reality (Barlow, 1972; Kanerva, 1988; 
Churchland and Sejnowski, 1990). In other words, a subjective 
internal model of the world serves as a substrate for performing 
computations which — by predicting the outcome of actions in the 
real world — allow efficient decision-making, even in novel situa- 
tions (von der Malsburg, 1986, 1990; Kanerva, 1988). This maybe 
the core of what the brain does. 



Olfactory systems, which are in evolutionary terms ancient 
and found even in simple animals, accomplish this task very effi- 
ciently. The signals they analyze are plumes of airborne molecules 
and complex mixtures thereof — variable signals occurring on 
highly noisy background (Kadohisa and Wilson, 2006; Raman 
and Stopfer, 2010; Raman et al., 201 1) — and from this input they 
extract meaning (such as "food," "predator," or "potential sex- 
ual partner"), which is translated into behavioral output (actions 
such as foraging, escape, or courtship, respectively). 

How is this task accomplished by neural hardware? Circuit 
architecture is a key to understand brain dynamics and function. 
A full characterization of neural circuitry — including cell types 
and their integrative properties (input-output functions), con- 
nectivity between them (statistics, pattern, signs, and strengths) 
and external input driving the network (rates, auto- and cross- 
correlations, synchrony, etc.) — is necessary, though not sufficient, 
for transcending the descriptive level and distilling the system's 
design principles (Churchland and Sejnowski, 1992). This in turn 
yields a deeper understanding of how basic network features and 
their interrelations give rise to its higher properties. Few bio- 
logical neural systems, however, are presently characterized in 
sufficient detail; most are riddled with complexity, knowledge 
gaps, and high-dimensional parameter-spaces. 

One example where detailed knowledge exists on network 
parameters and coding schemes is the olfactory system of the 
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locust (Schistocerca americana) (Figure 1A). In this relatively sim- 
ple system, 800 broadly tuned and noisy second-order neurons 
(projection neurons, PNs) project directly onto 50,000 third- 
order neurons (Kenyon cells, KCs), which are highly selective and 
reliable in their odor responses (Perez-Orive et al., 2002). As the 
system is feed-forward, small, well-defined, and displays a dra- 
matic change in coding — from distributed to sparse — between 
source- and target-populations, it seems well suited for studying 
the origins of neuronal specificity. 

The locust olfactory system (Figure 1A) receives odor input 
through the antenna, via ~90,000 olfactory receptor-neurons 



A antenna antennal lobe 



B 



FIGURE 1 | Framework for studying the separation of neuronal 
representations. (A) Circuit diagram of the locust olfactory system. Odor 
information reaches the antennal-lobe via ~90,000 olfactory receptor-neurons 
in the antenna. In the antennal-lobe, ~800 projection neurons (PNs, yellow) 
project it further to the mushroom body (onto ~50,000 Kenyon cells (KCs), 
blue; PN-KC connection probability Vz) and the lateral horn (not shown). In 
transition from PNs to KCs the odor code dramatically changes from broad 
and highly distributed (in PNs) to sparse and specific (in KCs). KC axons split 
into the a- and (i-lobes, where they synapse onto a- and (5-lobe extrinsic 
neurons (green), respectively (KC-p-lobe-neuron connection probability 
~0.02). Red arrows indicate direction of information flow. See text for more 



(ORNs) which terminate in the antennal-lobe. The antennal- 
lobe is a small network: ~800 excitatory PNs which send their 
axons to the next relays in the system (forming the antennal-lobe's 
sole output), and ~300 inhibitory interneurons (not shown in 
the diagram) which act locally within the network (Laurent and 
Davidowitz, 1994; Leitch and Laurent, 1996). PNs each respond 
to a wide array of odors with rich, complex spike-trains encod- 
ing odor identity (Laurent and Davidowitz, 1994; Laurent, 1996; 
Wehr and Laurent, 1996; Perez-Orive et al., 2002; Mazor and 
Laurent, 2005) and concentration (Stopfer et al., 2003); PN- 
spike-trains are additionally locked to a 20 Hz oscillatory cycle 



mushroom body 



horn 



; neurons 

neurons 
neurons 



details. (B) Mathematical framework for studying transformation in coding. 
Model represents the state of a theoretical network inspired by PN-KC 
circuitry during a brief snapshot in time. Color code and information flow 
same as in (A). A set of W source-neurons (activity of which is denoted by 
binary-valued vector S; i.i.d. with probability p) projects onto a set of M target 
neurons (activity of which is denoted by vector K) via a set of feed-forward 
connections (binary-valued connectivity matrix |/v; i.i.d. with probability c). 
The aggregate input to the target layer K is the vector k, a product of 
source-neuron activity vector S and connectivity matrix W- K is obtained by 
thresholding k using the Heaviside function 4>l k — /). 
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which is synchronous across the PN population (Laurent and 
Davidowitz, 1994; Laurent, 1996; Laurent et al, 1996) and is 
reflected in local-field-potential oscillations. With no odor pre- 
sented, PNs fire spontaneously at rates of 2.5-4 Hz (Perez-Orive 
et al, 2002; Mazor and Laurent, 2005). Odors are represented 
by a dynamic combinatorial code (Laurent et al, 1996; Wehr 
and Laurent, 1996) which is broadly distributed across the 
PN population (Perez-Orive et al., 2002; Mazor and Laurent, 
2005). 

Output from the antennal-lobe is projected, via PN axons, 
onto two direct target-areas: the mushroom body, a structure 
involved in learning and memory (Heisenberg, 1998), and the lat- 
eral horn. The mushroom body contains ~50,000 small neurons, 
the KCs (Laurent and Naraghi, 1994; Leitch and Laurent, 1996). 
Individual KCs respond to specific odors (either monomolec- 
ular odors or mixtures), their responses are characterized by 
few spikes, are highly reliable across different presentations of 
the same odor (Perez-Orive et al., 2002), and are often con- 
centration invariant (Stopfer et al, 2003). KC responses occur 
on a background of extremely little spontaneous firing (Laurent 
and Naraghi, 1994; Perez-Orive et al., 2002; Mazor and Laurent, 
2005; Jortner et al, 2007; Jortner, 2009). Mushroom body odor- 
responses thus involve small, highly selective subsets of KCs 
(Perez-Orive et al., 2002, 2004; Stopfer et al, 2003; Jortner, 2009). 

Axons of KCs exit the mushroom body calyx in a tight bundle 
(forming the mushroom's stalk, or pedunculus), branching into 
the mushroom body's output nodes, the a- and f$-lobes (Laurent 
and Naraghi, 1994). There, KC output is integrated by smaller 
populations of extrinsic neurons (called a- and P-lobe neurons, 
respectively; Figure 1A) with large, planar dendritic trees which 
intersect KC-axon bundles at neat right angles (Li and Strausfeld, 
1997; MacLeod et al., 1998; Cassenaer and Laurent, 2007), sug- 
gesting potential integration of precisely timed spikes over a wide 
KC-subpopulation. 

Several previous studies offer theoretical treatment of 
the locust antennal-lobe-mushroom-body transformation (e.g., 
Garcia-Sanchez and Huerta, 2003; Theunissen, 2003; Huerta 
et al, 2004; Sivan and Kopell, 2004; Finelli et al., 2008); these, 
however, lack quantitative data regarding critical network param- 
eters, such as connectivity values. More recent experimental work 
quantified aspects of network architecture via electrophysiolog- 
ical measurements of connectivity between PNs, KCs and f$- 
lobe-neurons (Jortner et al., 2007; Cassenaer and Laurent, 2007). 
Results show that each KC receives synaptic connections from V2 
of all PNs on average (~400 out of ~800 PNs); PN-KC synapses 
are very weak [excitatory-postsynaptic-potential (EPSP) ampli- 
tude is 85±44|iV], and KC firing thresholds correspond to 
simultaneous activation of ~ 100 PN-KC synapses (assuming lin- 
ear summation) (Jortner et al., 2007). Connections between KCs 
and some of their outputs (P-lobes neurons) are, on the other 
hand, sparse (~2% of pairs) and strong (EPSP amplitude 1.58 ± 
1.1 mV), and exhibit Hebbian spike-timing-dependent plasticity 
(Cassenaer and Laurent, 2007). 

Can these findings explain the transformation in coding 
schemes? What is the functional significance of this design? In 
the present study I explore design principles by which the brain 
constructs specific, sparse and high-level representations of the 



surrounding world. A coding strategy both sparse and selective 
would be one where only a small subset of neurons respond to any 
given stimulus or external state (i.e., high population sparseness; 
Willmore and Tolhurst, 2001), and only a small subset of stim- 
uli or external states elicit response in each neuron (Jortner et al., 
2007; Jortner, 2009). Inspired by the network architecture of the 
locust olfactory pathways, I suggest an exciting implementation 
of neuronal hardware to this end. My central claim is that in a 
feed-forward system with connectivity V2, target neurons differ 
maximally from each other in information they contain about 
the world (or external state); in this sense serving as an optimal 
neural module for parsing the world of inputs, and a substrate 
for sparse and specific neuronal-responses on the basis of which 
learning, categorization, generalization, and other essential com- 
putations can occur. The targets' sparseness is set to a controlled, 
arbitrary level by choice of a proper and adaptive firing thresh- 
old. Next, I address these points through a straightforward yet 
rigorous mathematical approach. 

METHODS 

The model I use is highly reduced, consisting of a layer of source- 
neurons (equivalent to PNs), projecting onto a layer of target 
neurons (equivalent to KCs) via a set of feed-forward connec- 
tions (Figure IB). Following several simplifying assumptions, I 
describe the mathematical framework and proceed to solve some 
of its behavior analytically — yielding predictions about function 
and about how network design relates to coding. 

MODEL ASSUMPTIONS 

For the sake of tractability and predictive power, I make four 
important simplifying assumptions. First, I choose to look at a 
"snapshot" of the system in time; a brief-enough segment so that 
for any given PN the probability for spiking more than once is 
negligible. Within this time window, the PN population can be 
treated as a vector of binary digits, one denoting the occurrence of 
a spike and zero denoting none. As a second assumption, all PNs 
are treated each as firing (or not) within this time window with 
probability p which is identical across all PNs, and doing so inde- 
pendently of each other (i.i.d.); this allows treating the PN activity 
vector as binomial with a known parameter. Third, all synap- 
tic connections are treated as equal in strength. As a fourth and 
last assumption, connectivity between PNs and KCs is assumed 
to be random, with i.i.d. statistics and probability c across all 
PN-KC pairs. 

These assumptions, and particularly those of i.i.d. statistics 
of firing and connectivity, wield great predictive power; I will 
revisit them in the Discussion (Section "Regaining Complexity: 
Reexamining the Model's Initial Assumptions"), examine their 
validity with respect to experimental data on the locust olfactory 
system, and assess, wherever biological reality deviates from them 
(e.g., when some dependence and correlations are introduced), 
how model results may be affected. 

MODEL DESCRIPTION 

A schematic cartoon of the network-model appears in Figure IB. 

There is a set of N source-neurons, denoted by vector S (so 
the neurons are Si, S2, ■ ■ ■ , Sjv): these are analogous to PNs in 
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the antennal-lobe. A second set of M target neurons, denoted by 
vector K (so neurons K\,Ki, Km), are analogous to KCs in 
the mushroom body. Source-neurons ( S ) connect to target neu- 
rons (K) through randomly determined connections of uniform 
strength; each PN can thus either connect to a given KC or not, 

with probabilities c and (1 — c), respectively. W is the connection 
matrix, with W« = 1 if the ;'th PN connects to the zth KC and 0 

otherwise. Each row of W indicates the set of PNs physically con- 
nected to a given KC (so there are as many rows as KCs), and each 
column indicates the set of KCs receiving physical connections 
from a given PN (so there are as many columns as PNs). The rows 
I will refer to as the connectivity vectors to KCs. 

As pointed out in the assumptions, the model looks at a snap- 
shot of the neural system during a brief time window. Within it, 
each of the PNs can either fire a spike or not, and does so with 

probabilities p and (1 — p), respectively, so S also takes binary 

values. I call S the activity vector of the PN population, and § will 

be the set of all possible activity vectors, so S € §■ 
Formally, then (Figure IB): 

jl with probabil. p . 
1 ' ~ \ 0 with probabil. 1 — p ~~ 

f 1 with probabil. c 
Wa=\ . , F 11M i= 1.....M; j= 1, ...,N 

1 [0 with probabil. 1 - c ' 

During our given time window each of the M KCs receives 
PN inputs, which additively determine its "membrane-potential." 
The input to each KC, to which I refer throughout this work as 
its aggregate input (denoted by /c, for the ith KC) is the sum of 
all PNs connected to it which fire during that time window, or 
formally 



So K is a binary- valued vector, JC, indicating whether or not the 
ith KC fires, and K is the set of all possible target-neuron activ- 
ity vectors, so K£ K. Thus, in this model, for a network with 
N PNs and M KCs (with threshold /) and a fixed connectivity 

matrix W, a given state of the PN population (denoted by activity 
vector S ) unambiguously determines the activity vector of the KC 
population, K- 

MATHEMATICAL CONVENTIONS, SYMBOLS, AND ABBREVIATIONS 

While the mathematics used throughout this work is mostly ele- 
mentary, some of the derivations are nonetheless rather tedious. 
For the sake of clarity, they appear in shortened form within 
the text; I provide commented step-by-step derivations in the 
Appendix. 

All but the most standard mathematical symbols used are 
defined the first time they appear. For quick reference, they are 
also listed in Table 1 . 

RESULTS 

MODEL RESULTS I: EXPLORING PROPERTIES OF THE CONNECTIVITY 
MATRIX 

Examining the set of connections between the neuronal popu- 

-* -> 

lations S and K (connectivity matrix W), we may ask to what 

+± 

extent two connectivity vectors (rows of W) differ from each 
other. Let us calculate how many binary digits will, on aver- 
age, differ across two such connectivity vectors (which I call 
U and V). This difference-measure is the Hamming distance 
between the two vectors, denoted by H(U, V). Since all ele- 
ments of the connectivity matrix are independent from each 
other, we can simply calculate the probability that an element of 
U differs from the matching element in ^(detailed derivation in 
Appendix Al): 



k = W ■ S 

N 

;'= i 

Thus, k is a vector which takes natural values between 0 and N 
(according to how many of the PNs converging onto the KC fire). 
Each KC then fires a spike if and only if its aggregate input equals 
or exceeds the firing threshold, /, or 

K= 3>(k-f) = 3>(w ■ S-f] 
where O (X) denotes the Heaviside function: 



<t>(X) 



(0 oi 



X > 0 
otherwise 



Vr(Uj # Vj) = Pr(Uj = 1, Vj = 0) + Pr(l/ ; = 0, V) = 1) 
= c(l - c) + (1 - c)c = 2c(l - c) 

and multiply by the total number of elements N to get the 
Hamming distance: 

(H(U, V)) 0 y =N-MUj # Vj) = 2Nc(l - c) 

As this expression shows, when viewed as a function of the con- 
nection probability, c, the Hamming distance between two rows 

of W is maximal for c = Vz, and drops symmetrically around 
it (Figure 2A, for N = 800). Thus, under the model assump- 
tions, PN-KC connectivity vectors will on average be maximally 
different (as measured by Hamming distance) from each other 
when each pair of cells (PN and KC) is equally likely to be con- 
nected or not. This already suggests some special property of 
the experimentally observed connectivity matrix (Jortner et al., 
2007). 

If we now pick two connectivity vectors at random, what is 
the probability that they are identical? In other words, what is the 
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Table 1 | Mathematical symbols used throughout the paper. 

c Probability of PN-KC connection (scalar, real within 

interval [0,1]) 

CDF Cumulative Distribution Function 

CLT Central Limit Theorem 

Dfx, y) Absolute difference between x and y,\x-y\ (For 

binary values: (x - y) 2 ) 
f Kenyon-cell firing threshold (in units of PN 

inputs) — (scalar, non-negative) 
H(X, Y) Hamming distance between binary-valued vectors 

X, Y; number of bits by which they differ (scalar, real, 

non-negative) 

i.i.d. independent and identically distributed 

K Activity vector of the Kenyon-cell population (vector, 

Mx1, binary values) 
k vector of aggregate inputs to Kenyon cells (vector, 

Mx1, natural values) 
K Set of all possible Kenyon-cell activity vectors 

M Total number of Kenyon-cells (scalar, natural) 

N total number of PNs (scalar, natural) 

p PN-firing probability within characteristic time window 

(scalar, real within interval [0,1]) 
PDF Probability Density Function 

Pr(x) Probability of x 

Q; Q(x) The Standard Normal cumulative distribution function; 

its value at x 

S Activity vector of the PN population (vector, A/x1 , 

binary values) 

S Set of all possible PN activity vectors 

Random PN-KC connectivity vectors; rows of W 
(vectors, 1 x N, binary values) 
Random subsets of PNs 

Connectivity matrix between PNs and KCs (Matrix, 
M x N, Binary values) 

variance of aggregate input to a KC (scalar, 
non-negative) 

The Heaviside (step) function; producing 1 if x > 0 
and 0 otherwise 

Mean aggregate input to a KC (scalar, real) 
Pearson's correlation coefficient between x and y 
Equals in distribution 
Equals by definition 

Intersection of sets A and B (objects which belong to 
both A and B) 

Union of sets A and B (objects which belong to A or 
B, inclusive or) 

Symmetric difference of sets A and 6 (objects belong 
to A or B, but not both) 
Number of elements in set A 

Relative complement of sets A and B (objects belong 
to A and not to 6) 
Factorial of x 
Absolute value of x 

Expected value of X over all possible values of Y 
(with their respective probabilities) 



U, V 

u,v 
W 

A 

*(x) 
* 

p(x, y) 

AC\B 
A{JB 
AAB 

H 

A\B 

x! 

\x\ 
My 

x 
y 



probability that two randomly chosen KCs sample the exact same 
ensemble of PNs? 



Pr(H (U, V) = 0) = Pr|^ J^iUj - Vj) 2 = Oj 
= Pr(U,- = V ; |Vj) 

= (Pr([/ ; - = 1, Vj = 1) + Pr(Uj = 0, Vj = 0)) N 
= ( c 2 + (1 _ c) 2^ = (2c 2_ 2c+l) N 

Similarly, the probability that the two connectivity vectors differ 
from each other by exactly d PNs is: 



Pr(H(U, V) =d) = Pr Jj U ) ~ V ^= d = (H U j =1 > V i= l ) 



+ Pr(Uj = 0, V, = 0)) N d ■ (Pr([/ ; = 1, V, = 0) + . 
+ Pr(Uy = 0, Vj = l)) d ■ 



(2c 1 - 2c + if d ■ {2c(\-c)) d 



d\(N -d)\ 



This yields a theoretical probability-density function (PDF) 
for the Hamming distance between connectivity vectors 
(Figures 2B-D). Note that for all values of c the PDFs are always 
rather narrow (Figure 2B), with most of their mass concentrated 
close to their mean value. This is a key property of binomial 
distributions with large values of N, and implies that most 
pairs of connectivity vectors in a system obeying our basic 
assumptions will differ by similar values, well predicted by their 
mean Hamming distance. Note also, that the PDF centered on 
the highest value is for c = Vj, the connectivity value measured 
between PNs and KCs in the locust. Figures 2C,D provide a 
closer look at this particular case (see next section). 

Connectivity Vi thus maximizes differences between PN-KC 
connectivity vectors. I demonstrate this graphically in Figure 3 
using elementary Venn diagrams. Two different KCs, each of 
which samples PNs randomly and independently with probability 
c, thus define two sets of PNs (I call these sets u and v). Each large 
(open) circle in Figure 3A represents the entire PN set (with area 
N), the two smaller circles within it mark the PN subsets u and v 
sampled by our two KCs (with average area N ■ c each; the value 
of c is indicated above each diagram). 

The set of PNs sampled by both KCs (the overlap of the two PN 
sets) is the intersection of u and v, the number of PNs it includes 
on average is 



lunvll 



Nc z 



u,v 



x-choose-y, the number of ways to pick y elements 
out of x 



as demonstrated by the dark-shaded areas in Figures 3A,B- 
Similarly, the set of PNs sampled by exactly one of the two KCs 
(the non- overlapping portion of inputs to the two KCs, or their 
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Hamming Distance Hamming Distance 



FIGURE 2 | Analysis of Hamming distances between PN-KC 
connectivity vectors. (A) Analytic solution for mean Hamming distance 
between PN-KC connectivity vectors as a function of connectivity, c (for 
N = 800 PNs). Hamming distance is a parabola in c with maximum at 
c = V2. (B) Probability-density functions (PDFs) of Hamming distances 
between PN-KC connectivity vectors, Pr(H(U, V) = /), calculated as a 
function of connectivity. Each row corresponds to one PDF (with PN-KC 
connectivity value on ordinate); color represents probability. N = 800 PNs 
assumed in all cases. Note narrow distribution of Hamming distances 



around their mean for all values of c. Note PDF centered around highest 
value (400) is for c = V2. (C) Theoretically calculated PDF of Hamming 
distances between PN-KC connectivity vectors for parameters measured 
in the locust olfactory system (A/ = 800, c = V&). Note most common 
Hamming distance value, 400 (as each of two KCs samples on average 
200 PNs that the other does not, see text). Shaded area, interval [350,450] 
in which most probability-density (0.9997) is concentrated. (D) Linear-log 
plot of same PDF as in (C). Shaded area, interval [350,450]. Note 
miniscule values away from the mean (outside shaded area). 



symmetric difference, A, in set theory terms) is the union of u and 
v minus their intersection; the average number of PNs it includes: 

<||uAv||) UiV = <||(«Uv)\(«nv)||) UiV 

IN N N \ 

= ( E u i + E Y/- 2 E m) =22Sfcci - c > 

V = 1 ; = 1 ; = 1 I fjy 



difference between them (the non-overlapping ensemble) peaks 
at V2 (Figure 3C). 

Differences between receptive ranges (or "receptive fields") of 
two target neurons are thus large when they each sample an inter- 
mediate proportion of the source-population — sampling either 
a very small or very large proportion yields much smaller non- 
overlapping ensembles, hence source-populations less different 
from each other. 



which corresponds to the light-shaded area in Figures 3A,B- This 
tells us how much these two KCs differ on average in PN ensem- 
bles they sample (or in their "receptive fields" in terms of input). 
This area is small when c is very low or very high, and maxi- 
mal when c = 0.5 (as seen in Figure 3A, and more clearly in the 
bar graphs in Figure 3B). In fact, this expression is also precisely 
the result we got for Hamming distance between connectivity 
vectors (see above and Figure 2A). The white areas ("None" in 
Figures 3A,B) correspond to PNs not sampled by either of two 
KCs. Both the average union and average intersection of the two 
PN ensembles increase monotonically with connectivity, but the 



PROPERTIES OF THE CONNECTIVITY MATRIX: PLUGGING IN 
REAL-DATA VALUES 

To sense how the above translates into biological reality, let us 
apply these calculations to the connectivity matrix of the locust 
olfactory system. For values relevant to the locust (N = 800 PNs 
and c = Vi) the mean Hamming distance between two PN-KC 
connectivity vectors is 400; two randomly chosen KCs will thus 
overlap by 200 connected PNs on average, and each of the KCs 
will on average sample 200 PNs which the other does not. There 
will be an additional 200 PNs which are not sampled by either of 
the two KCs. 
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FIGURE 3 | Connection probability V4 maximizes differences between 
KC input-populations. (A) Schematic representation of two KC inputs 
using Venn diagrams. Each large (empty) circle represents the entire PN 
population; the shaded circles within it represent two average KCs 
receiving connections from subsets of these PNs (with probability indicated 
above each diagram). Total shaded area (light-shaded + dark-shaded) 
represents the union of the two KC input-ensembles (or "receptive fields"), 
while the dark-shaded area alone is their intersection. The light gray area 
thus corresponds to the non-overlapping portion of the input-ensembles 
(union minus intersection), or to how different KCs are from each other in 
terms of input. (B) Same as in (A) using bar graphs. Each large rectangle 
represents the entire PN population; shaded areas use same color code and 
same connection probabilities as in (A). (C) Analytically calculated curves of 
the union (dotted line), intersection (solid gray) and their difference (red) for 
two KCs in terms of PN input, as a function of PN-KC connection 
probability c. While the two former are both monotonically increasing, the 
latter is maximized at c = V2. Representations of the outside world are thus 
spread maximally across the target neuron population for connectivity V2. 



Figures 2C,D show the predicted distribution of Hamming 
distances between PN-KC connectivity vectors in the locust. Note 
the mean Hamming distance between two KC connectivity vec- 
tors (400) is also by far the most common value; it occurs with 
probability 0.028. The main mass of the distribution is tightly 
concentrated around the mean value (Figure 2C): 0.9997 out of 
a total mass of 1 of the PDF lies within ±50 PN inputs from the 
mean (shaded area in Figure 2C); randomly chosen pairs of KCs 
will thus almost always (in 99.97% of cases) have input-ensembles 
differing by 350-450 PNs. The PDFs take extremely small values 



further away from the mean, as better seen on a semi-logarithmic 
scale (Figure 2D, shaded area is same interval): note the minis- 
cule probabilities outside the interval 350-450. The probability 
that two different KCs will sample the exact same PN ensem- 
ble is ~10~ 241 , and the probabilities that their input-ensembles 
will differ by 1, or 2, or 3 inputs are 10~ 238 , 10~ 235 , and 10~ 233 , 
respectively — vanishingly small numbers in all these cases. 

MODEL RESULTS II: NEURONAL ACTIVITY AND PROPERTIES OF INPUT 
TO KCs 

Up until now, we only considered the properties of the connec- 

tivity matrix, W. To see what happens when neural activity is 
added in, let us put some flesh on the dry skeleton, and proceed to 

explore the aggregate input to KCs ( k ) during network activity — 
corresponding to their sub-threshold membrane-potential. The 
symbol *I> denotes the mean aggregate input to a KC, averaged 
over all possible PN-population states and across all KCs. Then 

* = (fe>4i = ^E = (E w 'j( s j)}j 

N 

=p- EW.' =f * c 

;'= 1 

the mean aggregate input to a KC during our arbitrary time 
window is thus a simple product of the number of PNs, proba- 
bility of spiking in a single PN during this snapshot and PN-KC 
connection probability (Figure 4A). 

A will denote the variance of fc;, averaged across all KCs and 
over all possible PN-population states (Figure 4B) (see Appendix 
A2 for full derivation): 

A ee (varCfc,))^^-^) 2 )-), 

-((lf>H)l 

N N 

= E E (w*-w*) i tos*) s +... 

+ EH,(4- 2 *- E(^>,(S;> S + * 2 
j = 1 j=i 

= Npc(l — pc) 

So we have explicitly expressed the mean and variance of the 
aggregate input k\ (Figures 4A,B) as a function of basic network 
parameters. Note that variable k; is a product of two mutually 
independent, binomially distributed variables: the momentary 
vector of spiking in the PN population [a binomial with parame- 
ters (N, p)], and the vector of connections between the PN set and 
the KC [a binomial with parameters (N, c)]. Their dot product, 
lq, is also a binomial variable, with parameters N and p ■ c, as 
indicated by the calculations of *I> and A. 
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FIGURE 4 | Theoretical properties of network input to KCs. 
(A-D) Analytically calculated properties of the aggregate input to KCs (k) 
during network activity. Aggregate input is also analogous to KC 
membrane-potential (see text). Each property plotted as a function of 
PN-spiking probability p and PN-KC connectivity c, and averaged over all 
antennal-lobe states. N = 800 PNs is assumed in all cases. Left, surface 
plot; right, contour plot. Contour intervals are identical within each plot. 



Dash-dot lines indicate ridge contours. For clarity, isoline values are 
sometimes indicated beside plot (when contour lines are too dense for 
inline labeling). (A) Mean aggregate input per KC, * [units of PNs]; contour 
interval, 40. (B) Variance of aggregate input per KC, A [units of PNsl; 
contour interval, 10. (C) Covariance between aggregate inputs to two KC 
[units of PNs]; contour interval, 10. (D) Correlation coefficient between 
aggregate inputs to two KC [unitless]; contour interval, 0.05. 
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To what extent are aggregate inputs to two KCs correlated 
with each other? Calculating their covariance (Figure 4C) we get 
(Appendix A3): 



(cov(fc r , k t )) r 



(k r )s) (h ~ (k t )s))s) r 



N N 
,= l;=l,Mi 



i = 1 

N 



i = 1 

N 

■*EK"U f ( s ;>s + * 2 = Nc2 P( 1 -P) 



and their correlation coefficient (Figure 4D) is: 



W r W, / cov(fc,fc t ) \ = 

\Vvar(fc r )-var(fcf)/ r7tf / 



Nc 2 p(l -p) 



(Npc{l-pc)f 



c — pc 
1 — pc 

Note that both covariance and correlation coefficient have non- 
negative values in our model (as p and c are probabilities, 
1 > p, c > 0, and N is the number of PNs, N > 0); this is 
expected in a network with architecture as described — with all 
connections feed-forward and excitatory — and with no correla- 
tions assumed between external inputs to the system. For c = 1, 
the correlation coefficient is 1 (as all KCs see the exact same 
input); for c = Vi the correlation coefficient is jEp> ran g m g 



between 0 and Vi. 



MODEL RESULTS III: INTER-KC DIFFERENCE IS MAXIMAL FOR 
CONNECTIVITY Vi 

We now touch a fundamental question: for given network param- 
eters, how much do target neurons differ from each other in 
their aggregate inputs? This will tell us how much two KCs dif- 
fer in sub-threshold membrane-potentials within a given cycle 
in the active network (earlier we asked how connectivity vec- 
tors differ; Section "Model Results I: Exploring Properties of the 
Connectivity Matrix"). Let us calculate the difference, D(X, Y) 
X — Y\, between aggregate inputs to two KCs (see Appendix A4 
for alternative derivation): 



D(k r ,k t ) = {{\k r 

= N- {{(W n S,) 2 



N- {{(W rt S 1 -W ti S i )\) r 



2W ri W tl S 2 + (W tl S,) 2 y s ) r ^ t 



N-[{W n ) r ^ t - (S,)s-2(W n W tl ) r ^ t 



= N ■ (cp - 2c 2 p + cp) = 2pNc{\ - c) 

It is straightforward to see that when taken as a function of 
PN-KC connection probability, D is maximal for c = Vi; this holds 
for any positive p and N (i.e., for all biologically relevant cases, 
with non-zero PN-firing probability and more than zero PNs in 
the network). The behavior of D as a function of p and c is shown 
in Figure 5A, and in normalized form in Figure 5B. 

The above proves that when each target-cell samples half of 
the source-neurons, the mean difference between inputs to any 
two targets is maximized. Stated differently, each KC is on aver- 
age maximally different from all other KCs in the information it 
carries about the external state. 

INTER-KC DIFFERENCE: PLUGGING IN REAL-DATA VALUES 

We can now introduce the values measured experimentally in the 
locust into our model. At baseline, PNs typically fire at ~2.5-4 Hz 
(Perez-Orive et al., 2002; Mazor and Laurent, 2005). The rele- 
vant integration time window for KCs is the 50 ms odor-induced 
oscillation cycle (Perez-Orive et al., 2002); even in the lack of 
oscillations EPSPs in KCs have a time course of several tens of 
milliseconds (Jortner et al, 2007). This provides a crude estimate 
of p, the probability of spiking within the relevant time window: 

p « 0.125-0.2 (Perez-Orive et al, 2002; Mazor and Laurent, 
2005); 

c ~ 0.5 (PN-KC connectivity measurements; Jortner et al., 
2007); 

N ~ 800 (axon count in the PN-KC tract; Leitch and Laurent, 
1996). 

Introducing these numbers into D = 2pNc{\ — c), the mean 
difference between two KCs is equivalent to 50-80 PN inputs 
(Figure 5C). If only 100 PNs converged onto each KC (c = 0. 125), 
the mean difference would be 22-35 PNs, and with only 10 
PNs per KC (c = 0.0125, as previously estimated; Perez-Orive 
et al., 2002), it would be equivalent to only 2.4-4 PNs at baseline 
(Figure 5C)! 

During odor presentation, average PN firing-rates do not 
change significantly over the population (Mazor and Laurent, 
2005). However, as PN-spikes are now confined to about half 
the oscillation cycle (the rising phase; Laurent and Davidowitz, 
1994; Laurent et al, 1996; Wehr and Laurent, 1996), p effectively 
increases by ~factor 2 (by virtue of the time window "shrink- 
ing"). The mean difference between two KCs thus increases to 
100-160 PNs during odor; if the fan-in were 100 PNs per KC 
(c = 0.125), or 10 PNs per KC (c = 0.0125), the mean difference 
would become 44-70 PNs, or 5-8 PNs, respectively. 
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FIGURE 5 | Distance between KC inputs (or membrane-potentials). 

(A) Difference (D) between aggregate inputs to two different KCs (averaged 
over all pairs of different KCs and over all antennal-lobe states) as a function 
of PN-firing probability p and PN-KC connection probability c. D is a parabola 
in c (with maximum at c = V2) and increases linearly with p and with W. 
N = 800 PNs is assumed. D is in PN inputs. Left, surface plot; right, contour 



plot, contour interval is 20. (B) Normalized difference and covariance 
between KCs as a function of connectivity c; normalized D and cov are 
unitless, independent of N and p and vary between 0 and 1. (C) Predicted 
difference between two KCs for PN-firing parameters measured of in the 
locust olfactory system (at both extremes of the range): N = 800, p= 0.125 
(2.5 Hz, red), and p = 0.2 (4 Hz, magenta). 



MODEL RESULTS IV: ESTIMATING FIRING THRESHOLD AND 
SPARSENESS 

The above observations do not yet relate to KC response prop- 
erties, as we up to now ignored membrane non-linearities and 
spiking. What happens when we impose a firing threshold, 
and assume the KC spikes once it is crossed? We now use the 
assumption of independence across PNs, and the fact that many 
of them respond to each odor and during each cycle (accord- 
ing to this model N ■ p per time window, or 100-160 PNs for 
values N = 800, p = 0.125-0.2. According to experimental data, 
100-150 PNs fire per cycle; Mazor and Laurent, 2005). With these 
assumptions, we can apply the Central Limit Theorem (CLT) to 
the summation of inputs onto a KC: we can treat k as a Gaussian 
random variable, fully defined by its mean (*) and variance (A) 
which we calculated (Section "Model Results II: Neuronal Activity 
and Properties of Input to KCs"): 

N 

k = £ W ljSj 

./= 1 

ki ~ Norm(*, A) = Norm(Npc, Npc(l — pc)) 

where NormQC Y) stands for a Normal distribution with mean 
X and variance Y. So for a given threshold / (in units of PN 
inputs), the probability of the zth KC crossing the threshold 
(i.e., spiking) is: 



, r -(x - 1») 2 

Mk >/) = ^= / e—^dx 
f 

= '-7k/^* = i-Q(S) 

— oo 

where Q(z) denotes the Normal cumulative distribution function 
(CDF) of variable z (Figure 6A). If the firing threshold of KCs/ is 
set to: 

/ = * + z . Va 

this will result in a known and defined fraction of all KCs — 
equal to the area of the tail of the Gaussian [given by the CDF, 
1 — Q(z)] — crossing the firing threshold for a given PN- 
population state. Similarly, a given KC will cross threshold / in 
response to a known fraction [again, the area of the tail 1 — 
Q(z)] of all PN-population states. This function, illustrated in 
Figure 6A, thus links the KC firing threshold with both the sparse- 
ness and the selectivity of the mushroom body neural code (as 
described in the Introduction). 

To demonstrate the usage of this function: a hypothetical 
feed-forward network which satisfies our assumptions (Section 
"Model Assumptions") has parameters N = 100, p = 0.5, and 
c = 0.2. If we wish to "design" a population of target neurons 
with a particular level of sparseness (say we want each respond- 
ing to 2.3% of external states), we will set their firing thresholds 
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FIGURE 6 | Linking KC firing threshold and response properties. 

(A) Dependence of Kenyon cell response sparseness on firing threshold. 
Kenyon cell aggregate input, or membrane-potential, k, behaves as a 
Gaussian random variable with mean * and variance A under model 
assumptions; KC-response probability is the probability of aggregate input 
crossing firing threshold f (red shaded area). This defines both 
single-neuron selectivity (i.e., the average fraction of states which evoke 
firing in a given cell) as well as population sparseness (i.e., the fraction of 
KCs responding to an average state). (B) KC responses to partially 
overlapping PN activity patterns. Pairs of PN activity vectors differing from 
each other by exactly d bits present a KC with quantifiably different 
aggregate inputs. The summed probabilities of the KC responding to the 
first pattern and not the second (brown) or vice versa (violet) reflect the 
Hamming distance between KC activity vectors. This function thus 
expresses the difference between target-states given the difference 
between source-states. (C-D) Difference between Kenyon cell activity 
vectors as a function of difference between PN activity vectors. Difference 
is expressed as Hamming distance normalized by the vector-length (or 
number of neurons in the population). The nine different curves represent 
different KC firing thresholds, equivalent to (left to right) [0, 1 , 2, . . . , 8] 
standard deviations above the mean aggregate input. Above the diagonal 
(stippled line) are regimes where Kenyon cell activity vectors differ from 
each other more than the corresponding PN activity vectors: i.e., where 
pattern-separation occurs. Below the diagonal are regimes where different 
PN activity vectors evoke relatively close KC activity vectors: i.e., 
generalization/retrieval occurs. Parameters of the locust olfactory system 
are used, and (C) and (D) differ by PN-firing rate: (C), p = 0.2, 
corresponding to PNs firing at 4Hz; (D), p = 0.125, corresponding to PNs 
firing at 2.5 Hz; Note how increasing threshold affects curve shape and 
relative regimes of generalization vs. discrimination. 



to be equivalent to 2 standard deviations above their mean 
aggregate-input (as 1 — Q(2) = 0.023), or: 

f = Npc + 2- jNpc(l -pc) 

= 10 + 2\/9 = 16 simultaneous inputs. 



A basic, gross prediction which naturally emerges from the 
threshold-sparseness function is that when the target neurons' 
threshold is equal to their mean aggregate input (*), each tar- 
get neuron responds to half of all external states [as Q(0) = 
1 — Q(0) = 0.5]; this is the most-distributed code possible (also 
known as a holographic code; Foldiak, 2002), and thus sparse 
coding is conditional on a threshold significantly higher than that, 
requiring/ >> 

THRESHOLD ESTIMATION: PLUGGING IN REAL-DATA VALUES 

Let us now test the predictions on firing threshold and sparseness 
using experimental parameters from the locust. Section "Inter- 
KC Difference: Plugging in Real-Data Values" shows the network 
parameters for PN-firing rates {p ~ 0.125-0.2), PN-KC connec- 
tivity (c « 0.5) and PN number (N » 800). Given these, the 
aggregate input a KC gets is on average 

= Npc * 50-80 PN inputs per cycle 

and its standard deviation is: 

Va = y/Npc(l — pc) « 7-8 PN inputs per cycle 

So for KCs to each respond to ~1% of all PN-population states, 
their threshold has to be ~2.5 SDs above the mean aggregate 
input:/ ~ 98-101 PN-inputs for firing rate of0.2 PN-spikes/cycle 
(4 Hz), or/« 67-69 PN inputs for 0.125 PN-spikes per cycle 
(2.5 Hz). 

This result is in good agreement with experimental measure- 
ments of KC-response sparseness (~l-2% using intracellular 
recordings; Jortner, 2009) and their firing threshold (/ ~ 100 
assuming linear summation and full PN-synchrony; Jortner et al., 
2007). The estimate's deviation toward the higher end of the pre- 
dicted range may be due to supra-linear summation in KCs at 
depolarized membrane-potentials (Laurent and Naraghi, 1994; 
Perez-Orive et al, 2002, 2004). 

MODEL RESULTS V: NOISE TOLERANCE AND GENERALIZATION 

Olfactory stimuli are by nature noisy and variable. PNs show sig- 
nificant trial-to-trial variability when presented with the same 
odor repeatedly, yet in KCs noise is considerably reduced. Another 
issue is that some stimuli are similar to each other (either because 
of chemically related odorant molecules, or because they are 
mixtures with overlapping components) and others are differ- 
ent. Both points — the way the system tolerates noise and the 
way it encodes similar or different inputs — are closely related, in 
that both require us to examine how two overlapping PN-firing 
patterns are transformed into KC firing patterns. 

Recall S, the set of all possible activity vectors of the source- 
population. Let us define { S , S } as the subset of vector-pairs in § 
which differ from each other by exactly rf-bits. Formally then: 

{S, S€ S\H{S, S) = d) 

The aggregate-input vectors to the target-neuron population 
which are evoked by S , S will be k , k , respectively; vec- 
tors K and K will be the resulting activity vectors of the 
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target-population. Aggregate inputs which a single KC gets in 
response to S , S will differ by: 

\kr-U 



(D(kr, k r )) ^ ^ 
{S, S} 



r,(S,S) 



N- ((WrfSj-WriSj) 2 ) ^ 
< h,[S,S) 



N-i W, 



2W ri 



{S,S} 



IS,S) 



[S.SJ 



Nc- 



(S.S) 



and so 



k r = k r =p 2cpd 



Earlier we linked KC aggregate input and firing threshold to their 
firing probability (Section "Model Results IV: Estimating Firing 
Threshold and Sparseness"; Figure 6A); let us use the same for- 
malism now. The probability that a single KC responds differently 
to two PN-states is simply the probability one of these states 
drives it across the threshold and the other does not. This is 
demonstrated graphically in Figure 6B and is exactly the rationale 
behind the following calculation: 

(Diki,ko)t^ ={\ki 

{S,S} 



{5, SI 



Pr(Ki = 1, k { = 0) + Pr(K; = 0, K, = l) 
Pr(*,->/, h </) + Fr(fci </, h>f) 
Pr(fc; >f,h<f+ 2c pd) + ■■■ 
+ Pr(fc; <f,ki>f- 2cpd) 

f+2cpd 

/ -(* - *) 2 , 
e 2A dx+ .. . 



/A2n 



7m / 



-(X - f) z 



dx 



f-2cpd 

What about the activity vectors for the KC population, given 
similar PN input? The mean Hamming distance between two 
KC activity-patterns given Hamming distance d between the PN 
activity patterns will simply be the above expression multiplied by 
the number of KCs, M. We can thus write: 



[H(K, K) 



H(S, S) = d 



= „( Q (£±^)- Q (/^p)) 



NOISE TOLERANCE AND GENERALIZATION: PLUGGING IN REAL-DATA 
VALUES 

So how well does the locust olfactory system tolerate noise? The 
results are shown in Figures 6C,D, where I feed into the rela- 
tion derived in the previous section the parameters from the 
locust circuitry. Two PN activity-patterns, differing by 0-800 bits 
(x axis, normalized to 0-1 ) evoke KC activity-patterns differing by 
0-50,000 bits (y axis, normalized to 0-1). The diagonal (stippled 
lines) in both figures shows where the hypothetical curve would 
pass if normalized distance between representations would not 
change in transition from PNs to KCs. In fact, the relation has a 
sigmoid shape, meaning PN patterns close to each other become 
even closer in the KC population; whereas PN patterns which are 
different become more different (note that distances are normal- 
ized to the population size). The nine different sigmoid curves 
show the relation between input- and output-overlap when the 
firing threshold is varied (left to right: 0-8 standard deviations 
above the mean aggregate input). The setting of the firing thresh- 
old clearly controls the boundary between generalization and 
discrimination; a boundary which is surprisingly sharp. 

In the locust olfactory system, the KC threshold is located 
~2.5 SDs above the mean aggregate input (commensurate with 
a sparseness of ~1% as observed; see Section "Threshold 
Estimation: Plugging in Real-Data Values"). As seen in 
Figures 6C,D, for this value the Kenyon cell population 
generalizes (or, tolerates noise) for PN patterns which are within 
up to ~50-100 bits away from the PN-KC connectivity vectors, 
and discriminates for ones which are farther. This means, that 
with parameters from the locust, the boundary between discrim- 
ination and generalization lies in a biologically realistic regime 
for highly sparse coding (recall that for binary 800-dimensional 
vectors, over 99.9% of space is removed 350-400 bits from any 
given vector; Figure 2). 

SUMMARY OF MODEL RESULTS AND PREDICTIONS 

This analytic model produces several insights and predictions, 
applicable to both the locust olfactory circuitry as well as to 
feed-forward systems in general. As the model was designed 
with generality in mind, its results depend only minimally on 
particular parameter values. Here is a brief summary: 

( 1 ) In a feed-forward system with random connectivity, pairs of 
connectivity vectors from source- to target-population have 
a maximal Hamming distance for connection probability Vi. 

(2) Hamming distances between connectivity vectors are 
mostly very similar to each other and to their mean value; 
connectivity vectors significantly more similar to each other 
(or, more different from each other) will be extremely rare 
(negligible). 

(3) Differences in aggregate input (or sub-threshold 
membrane-potential) between target neurons are maximal 
for c = Vi. In the locust antennal-lobe-mushroom-body 
circuit, where such connectivity is realized, pairs of KCs thus 
differ from each other by an equivalent of 50-80 PN-inputs 
during baseline, and of 100-160 PN-inputs when odor is 
presented. These differences decrease significantly when 
connectivity shifts away from l A (in either direction): with 
connectivity of 10 PNs per KC (as previously estimated; 
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Perez-Orive et al., 2002) differences between KCs would be 
equivalent to only 2.4-4 PN inputs (~factor 20 lower than 
forc = %). 

(4) The standard deviation of sub-threshold membrane- 
potential in target neurons is maximal when the prod- 
uct of spiking probability in the source-neurons (p) and 
connectivity between the source- and target-populations (c) 
is Vi. In locust KCs, the standard deviation of membrane- 
potential is predicted to be equivalent to the sum of 7-8 
PN-inputs, or ~0.6-0.7 mV, in good agreement with exper- 
imental measurements. 

(5) The covariance of aggregate inputs to two different target 
neurons will be maximal forp = Vi, and will increase as ~c 2 . 

(6) Both the covariance and correlation coefficient between tar- 
get neurons are predicted to be always positive under the 
assumptions taken. This is intuitive, given that no correla- 
tions were assumed in the external input driving the system, 
and only feed-forward excitatory connections exist. 

(7) The correlation coefficient between target neuron 
membrane-potentials is expected to range within 0-0.5 
for c = Vi. Particularly, in the locust, where c = Vi and 
PN-spiking probability is 0.125-0.2 per cycle (2.5-4 Hz), 
correlation coefficients between KCs are predicted to 
be 0.4-0.5. This remains to be tested experimentally 
with dual-intracellular KC recordings. A related test — 
namely measurements of correlations between single-KC 
membrane-potentials and local-field-potentials — yielded 
correlation coefficients around 0.3 (Jortner et al, 2007). 

(8) The response probability (and sparseness) of target neurons 
in a feed-forward system with parameters N, p, c is deter- 
mined by their firing threshold, and is well approximated by 
the area of a Gaussian tail. The threshold-sparseness func- 
tion predicts the fraction of states a target cell responds 
to, and the fraction of target cells responding to any given 
state. It generates the basic prediction that for a threshold 
equivalent to the target neurons' mean aggregate input, * (a 
product of source-neuron firing rate, source-neuron num- 
ber and connectivity), target neurons will respond to Vi of 
all source-population states; so to produce sparse coding the 
threshold must exceed that value: / > > * . 

(9) Applying the threshold-sparseness function to the locust 
olfactory system, the firing threshold measured (~100 
inputs, assuming perfect synchrony and linear summation) 
well predicts the measured KC sparseness level (~l-2%) 
and vice-versa (1% sparseness predicts a threshold of 
~70-100 inputs, depending on PN-firing rate). 

(10) Given a network with parameters N, p, and c = Vi, if target 

neurons have a firing threshold of/ (z) = Np+Z — V N K 2 p ) 
(see Appendix A5), then each target neuron will respond 
to 1 — Q(z) of source-population states, and different tar- 
get neurons will respond to maximally different states. The 
difference between the target neurons will on average be 
— 2 p . Combined with adaptive gain control to ensure that 
/ is changed appropriately when p changes (Papadopoulou 
et al., 2011), this yields a simple way to design a network 
with an arbitrarily sparse level of activity, and with specific 
and reliable neural responses to external states. 



DISCUSSION: LINKING NETWORK ARCHITECTURE 
AND NEURAL CODING IN THE 
ANTENNAL-L0BE-MUSHR00M-B0DY CIRCUIT 

Integrating theory and experiment, I here discuss how the archi- 
tecture of the locust olfactory system gives rise to Kenyon cell 
coding properties: specificity, reliability, low firing rates, corre- 
lations, and sparseness, and how these can be utilized to build 
higher-level representations of the animal's world. Several predic- 
tions with potentially broader implications will follow. 

CONNECTIVITY Vz MAXIMIZES DIFFERENCES BETWEEN TARGET 
NEURONS 

The key experimental finding motivating this study was that each 
KC in the mushroom body receives synaptic connections from 
antennal-lobe PNs with probability Vz, each thus sampling 400 
of the 800 PNs (Jortner et al., 2007). At first, this result may seem 
very surprising — because it seems counterintuitive that KC speci- 
ficity could arise from such broad PN inputs. It makes sense, 
however, when viewed from a combinatorics perspective: the 
number of ways to pick n elements out of N is given by the 
binomial coefficient: 



n\(N -«)! 



This expression is maximal for n = N/2, decreasing sharply 
and symmetrically around it. The fundamental realization that 
choosing half the elements maximizes the number of possi- 
ble combinations has dawned independently on several thinkers 
throughout history — from Pingala (India, 2nd-5th century BC, 
commentated by Halayudha, 10th century AD), through Al Karaji 
(Persia, 953-1029), Omar Khayyam (Persia, 1048-1131), Yang 
Hui (China, 1238-1298), Niccolo Tartaglia (Italy, 1500-1557) to 
Blaise Pascal (France, 1655). 

How is this relevant to the olfactory system? Think of each KC 
as if picking the PNs it will listen to. If each KC sampled only 
n = 1 of N = 800 PNs, there would be exactly 800 ways to pick 
which PN to sample (similarly, if each KC sampled 799 out of the 
800 PNs; where there would be exactly 800 ways to pick which PN 
not to sample). However, when sampling half the PNs, n = 400, 
the number of ways to do so is maximal, and equals 800!/400!400! 
~10 240 . This is an immense number — beyond astronomical — 
and way too large for any example from nature to demonstrate 
it. It is roughly equivalent to the number of atoms in the known 
universe (estimated at ~10 79 ) taken to the power of three. . . 

But as there are only 50,000 KCs in the locust mushroom 
body, only 5 • 10 4 combinations are realized out of this vast pool 
of possibilities. What is the probability that two randomly cho- 
sen KCs sample the exact same PN-ensemble? The answer is 
s^lO^ 240 , j s f or a \\ practical purposes zero. And what is 

the probability that two KCs sample very similar PN-ensembles — 
that is, ensembles differing from each other by just one, or 
2 or 3 inputs? The answers are 10~ 238 , 10~ 235 , and 10~ 233 , 
respectively — all vanishingly small. In fact, the average pair of KCs 
will differ by ~400 PN inputs (Figure 2C), which also constitutes 
the most common case (occurring with probability 0.028), and 
99.97% of KC pairs will deviate from it by less than 50 inputs 
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(Figures 2C,D). This stems from a key property of binomial dis- 
tributions with large N: most of their mass occupies a very narrow 
band around their mean. 

By this reasoning (proven for generalized cases in Sections 
"Model Results I: Exploring Properties of the Connectivity 
Matrix" and "Model Results III: Inter-KC Difference is Maximal 
for Connectivity W for connection vectors and membrane- 
potentials, respectively) each target neuron receives a unique set 
of source-neuron inputs, very different from that of all other 
target neurons. KCs are maximally different from each other in 
what they tell us about the world of inputs, because their con- 
nectivity vectors are drawn from a pool which is maximal. This 
feature of the KC population results directly from combinatorics, 
and from the probability Vi of receiving connections from their 
source-neurons (Figure 3). 

A critical comment raised by several colleagues against the 
above argument is that while this architecture indeed maximizes 
input separation, this optimum cannot reflect on biological real- 
ity. The brain, they argue, could not come so close to it, because 
the numbers in question are too large to be distinguished from 
each other by a biological system. In other words, realizing 50,000 
combinations of "only" 10 22 (the number of ways to pick 10 PNs 
from 800, corresponding to connection probability c = 0.0125) 
would be already immensely sparse; and for all practical purposes 
10 240 (the number of ways to pick 400 from 800) is not "sparser." 
Moreover, since the mathematical optimum is not necessary, evo- 
lution of such connectivity couldn't have possibly been guided by 
biological selection pressures. 

The results presented here (Section "Inter-KC Difference: 
Plugging in Real-Data Values"; Figure 5) refute this criticism. 

"800" 



While indeed the binomial coefficient 



rises very steeply 



with m and soon produces vast numbers, these numbers directly 
translate into state-dependent differences in aggregate input, or 
membrane-potential, produced across KCs (Figure 5B, normal- 
ized difference; Figure 5C difference in inputs). If 80 PNs were to 
connect to each KC (corresponding to c = 0.1), the amount by 
which aggregate inputs to different KCs would differ — and the 
system's ability to discriminate between external states — would 
drop approximately to a third of the optimum, and for 10 PNs 
per KC (c = 0.0125) it would drop by a factor of 20. When 
translated to membrane-potential differences between KCs, this 
may be critical for readout, especially in the presence of noise. 
This maximum is thus likely to be meaningful after all, and may 
account for the exquisitely clean performance of sparse neural 
systems feeding on noisy input. 

WHAT DETERMINES HOW SPARSE THE CODE WILL BE? 

KC aggregate inputs differ maximally as a result of the PN-KC 
connectivity; yet while this property paves the road toward sparse 
coding, it does not in itself suffice to explain the KCs' rare fir- 
ing: it is eventually their firing threshold which determines firing 
probability and response sparseness (Section "Model Results IV: 
Estimating Firing Threshold and Sparseness"). With such high 
convergence ratio (400:1), target cells can afford to have a very 
high firing threshold, which can account for KC specificity, reli- 
ability, and low firing rates. Experimental measurements show 



that KC firing threshold is equivalent to simultaneous activation 
of ~100 PN inputs assuming linear summation (Jortner et al., 
2007). An estimate based on intracellular recordings (and thus 
less biased than extracellular studies, as it also captures cells fir- 
ing rarely or not at all) suggests KCs respond to 1-2% of odors 
tested (Jortner, 2009). Here, a theoretical function was derived 
which links firing threshold and response probability (Section 
"Model Results IV: Estimating Firing Threshold and Sparseness"): 
it closely predicts the experimental results, estimating the firing 
threshold necessary to achieve ~1% KC sparseness at ~67- 
101 inputs, depending on PN-firing rates (Section "Threshold 
Estimation: Plugging in Real-data Values"). 

The threshold-sparseness function is quite sensitive to activ- 
ity levels of the input network (Huerta et al., 2004; Jortner et al., 
2007; Nowotny, 2009). Since PN population activity produces a 
range (100-150) of spikes per cycle (Mazor and Laurent, 2005), 
this can result in instability of the code — causing some external 
states to activate a large number of KCs and others to activate 
none at all (Papadopoulou et al., 2011). This requires adaptive 
gain control of the KC firing threshold to fit the actual activ- 
ity level of the input; one mechanism shown to maintain output 
sparseness over a wide range of input conditions in the locust 
takes place via a large, non-spiking GABAergic interneuron with 
extensive connectivity and graded release properties; it forms a 
negative-feedback loop onto KCs and adaptively regulates their 
population output on a cycle-to-cycle basis (Leitch and Laurent, 
1996; Papadopoulou et al. , 20 1 1 ) . 

In a theoretical exploration of the hippocampus, O'Reilly and 
McClelland (1994) also find that a "floating threshold" (as they 
phrase it) is highly useful for determining response sparseness 
under different input conditions and postulate that adjustment 
of the threshold can be useful for shifting between pattern- 
separation (or discrimination, or new-category formation) and 
pattern-completion (or generalization, or recall). 

It should be noted that the threshold-sparseness function 
derived here is independent from the results on connectivity, and 
it can be applied to systems with any parameters. 

EFFECTS OF PN-KC CONVERGENCE: RELIABILITY AND CORRELATIONS 

While overall PN-KC circuit-architecture is highly divergent due 
to the increase in dimensionality, the connectivity scheme makes 
single KCs receive massively convergent input (from 400 PNs). 
Together with the high and adaptive KC-threshold, this con- 
vergence sub-serves the KCs' reliable, low-noise performance: 
summing many PN-inputs prior to KC threshold (/) crossing is 
equivalent to massive averaging of PN activity. This reduces the 
significant variability (i.e., noise) present in individual, cycle-wise 
PN responses by a factor of l lJ]\ in locust KCs, where / ~100, 
noise is thus reduced ~ 10-fold. 

Another interesting effect of this convergence is the coexis- 
tence of correlations and differences in the mushroom body. 
While membrane potential-differences between different KCs are 
maximized, they are still predicted by the model to co-vary sig- 
nificantly (Figure 5B), with correlation coefficients of 0.4-0.5 
[see Section "Summary of Model Results and Predictions"; 
Prediction (7)]. Indeed, while the mushroom body code is highly 
sparse and specific (Perez-Orive et al., 2002; Jortner, 2009), a 
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salient property of KC intracellular membrane potentials is their 
strong correlations with the mushroom body local field potential 
(Laurent and Naraghi, 1994; Jortner et al, 2007), implying that 
they are also highly correlated with each other. How can strong 
correlations between cells, which we naturally tend to associate 
with similarity, exist side by side with maximal difference between 
them? 

To answer this apparent paradox we examine the inputs 
KCs receive vs. the outputs they produce. Correlations between 
membrane-potentials of two KCs result from massive overlap in 
their aggregate input: they share on average ~200 incoming PN- 
synapses, and 25-40 active PN-inputs per oscillation cycle. The 
relevant feature for the system is, however, the number of inputs 
by which they do not overlap (Figure 3): each also receives on 
average ~200 PN synapses (25-40 active ones per cycle) which 
the other does not; so they differ by 400 synapses (50-80 active 
inputs per cycle). The non-linearity imposed by the KC-threshold 
makes the two properties — correlations and difference — strongly 
diverge at this point: two KCs can get highly correlated inputs, yet 
may easily sit across different sides of the threshold, which in turn 
determines who will fire and who won't; both correlations and 
differences can thus coexist between them. 

The general message is that while sub-threshold correlations 
naturally arise from input overlap in highly interconnected sys- 
tems, they do not necessarily imply similarity in function (or 
output) between neurons; depending on network design and 
on the parameter taken as readout, the non-overlapping input 
may outweigh the overlap (as shown for KCs). Eventually, the 
non-linearity of thresholding enables brains to parse the world 
into percepts and build representations from them. Membrane- 
potential correlations between KCs may in this case be side effects 
of the interconnected architecture, rather than a computational 
feature of the code. 

NEURAL DESIGN-PRINCIPLES FOR GENERATING A SPARSE CODE 

As pointed out in the Introduction, a prerequisite for understand- 
ing a neural system is characterizing its basic features — individual 
components, connectivity and external input. Formulating higher 
properties in terms of these features bridges levels of descrip- 
tion and thus constitutes deeper understanding. This approach 
was used here link network design and sparse coding in the 
antennal-lobe-mushroom-body circuit. The experimentally mea- 
sured parameters f, c, p — corresponding exactly to the individ- 
ual unit input-output function, connectivity and input — were 
used to express distance-measures between connectivity vectors 
and between target neurons, sub-threshold behavior and coding 
sparseness. 

Three main principles govern the design of the antennal-lobe- 
mushroom-body circuit: First, there are many more target cells 
than source cells (~50,000 vs. ~800); a factor ~10 2 increase in 
dimensionality between the two odor-representations. Second, 
the probability of connection between the principal neurons of 
both relays is each target thus samples ~400 of 800 sources. 
Third, target-cell firing threshold is high, equivalent to simulta- 
neously activating ~100 of their inputs, and can be fine-tuned to 
fit different activity-levels of the source network. 

Due to the high threshold, each external state (or here, PN 
activity pattern) activates only a small subset of KCs. However, 



due to the connectivity scheme, different external states activate 
different KC-subsets. The activation of very small, very different 
subsets of cells in response to different external states suffices to 
produce a sparse and selective neural code as we defined it (see 
"Introduction", and Jortner et al, 2007). At the same time, the 
high PN convergence onto individual KCs explains why KCs are 
so reliable on the one hand, and on the other hand why their 
membrane-potentials are noticeably correlated with the local- 
field potential (Laurent and Naraghi, 1994; Perez-Orive et al., 
2002; Jortner et al., 2007), an observation that initially seemed 
paradoxical (Jortner et al, 2007). Finally, with thresholds so 
high, it is not surprising that the chances of "accidental" spik- 
ing are very small, and that KC spontaneous-firing rates are 
extremely low. 

The design principles described here thus lead to reliable, 
specific — sparse as well as selective — representations of random 
olfactory-percepts in the mushroom body, and form a simple way 
to make a sparse code spontaneously emerge, with no need for a 
"guiding hand" such as learning or predetermined connections. 

The total number of KCs, their fraction activated per external 
state, the levels of noise, and the cost function of classification 
errors will together determine how state-space is tiled — or how 
many odors can be reliably encoded by the mushroom bodies 
(and also, how many KCs are needed to encode a certain number 
of odors). A meaningful estimate is beyond the scope of this work, 
as a critical parameter — how distant KC representations must be 
from each other within noise constraints — is unknown. 

Directly from the above principles emerges a simple recipe 
for designing networks with optimal separation of representa- 
tions and arbitrarily specific responses. If two neuronal pop- 
ulations have feed-forward connectivity with probability Vi, N 
source-neurons firing p spikes per characteristic time, and target 

neuron firing threshold is equivalent to / (z) = Np + Z — V- N P( 2 P ) 
(Appendix A5), then each target neuron will respond to a known 
proportion of source-population states (the area under Gaussian 
tail 1 — Q(z)), and different neurons will respond to maximally 
different states. Adaptive gain control should be implemented to 
ensure / changes appropriately when p changes (Papadopoulou 
et al., 2011). At any given time, target neurons' aggregate inputs 
(momentary membrane-potentials) will on average differ by . 

GENERALIZATION vs. DISCRIMINATION 

An inherent dilemma when parsing sensory input is where to 
draw category-lines. Sometimes a stimulus must be recognized — 
i.e., grouped into an already-existing category — even if it has 
never been previously encountered in the exact same form. This 
allows recognition of sensory stimuli in the presence of noise, as 
well as grouping things together into meaningful categories (i.e., 
generalization), both of which are essential requirements for the 
brain to perform its tasks. 

In other cases, stimuli which may be very close to each other 
need to be told apart. Discrimination is critical when selecting 
food, for example. An extreme case is when the system needs 
to decide that something is completely novel and merits a new 
category of its own. 

It is important to recognize that these tasks — discrimination 
and generalization — contradict each other to some extent, yet 
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sensory systems need to be able to do both, and sometimes on the 
very same stimulus: something smells like a fruit (generalization), 
but clearly does not smell like an apple, though (discrimination). 

The model presented here provides some intuition on how this 
may happen. As shown in Figures 6C,D, the same network can 
perform both tasks: with the sigmoid-shaped relation between 
source- and target-separation, stimuli close to each other at the 
source-layer will be generalized by the population, whereas stim- 
uli farther from each other will be discriminated. The boundary 
between discrimination and generalization is rather sharp; and 
its location is determined by the firing threshold, which can be 
adapted (Papadopoulou et al., 201 1). 

KENYON CELLS CAN SERVE AS BUILDING BLOCKS FOR MEANINGFUL 
(AND PLASTIC) REPRESENTATIONS AT THEIR TARGETS 

The basic question we began our journey with is how the brain 
creates specific, high-level, and eventually ecologically meaning- 
ful percepts. The antennal-lobe-mushroom-body transformation 
described above achieves a major step in this direction by sep- 
arating representations and giving rise to specific and random 
responses. However, it remains to be discussed how these ran- 
dom response properties lead to ecologically relevant percepts, 
and how this fits into the mushroom body's widely accepted role 
in learning [reviewed in Heisenberg (1998)]. 

The distribution of connection strengths between PNs and 
KCs is rather narrow (Jortner et al, 2007); in addition PN-KC 
synapses show no short-term plasticity, such as homo- or hetero- 
synaptic facilitation or depression (Jortner et al, 2007). While 
these observations do not rule plasticity out altogether, they def- 
initely do not support plasticity playing a key role at PN-KC 
synapses. 

What happens at the transformation to the next relay? 
Dendritic trees of P-lobe neurons (one of the main classes of 
mushroom body outputs) are planar and oriented perpendicular 
to the KC-axon tract; this structure suggests that f$-lobe neurons 
can integrate precisely timed neural activity over a potentially 
wide subpopulation of KCs (Li and Strausfeld, 1997; MacLeod 
et al, 1998). Cassenaer and Laurent (2007) showed that connec- 
tivity from KCs to fi-lobe neurons is low (~2%), individual active 
synapses are relatively strong (1.58 ± 1.11 mV) and exhibit salient 
spike-timing dependent plasticity, which is sensitive even to single 
action-potentials. 

It is thus attractive to imagine the transformation of informa- 
tion from the antennal-lobe to the mushroom body as happening 
via widespread, random (or partially random) and largely fixed 
connections — designed to spread neuronal information opti- 
mally and create discrete, specific and reliable representations of 
random features. This would prepare it for further computation 
in downstream areas, such as the f$-lobe — where more complex 
ideas can then be constructed from these elementary building 
blocks, much like words and phrases are constructed from an 
alphabet (Barlow, 1972; Stryker, 1992). Hence as different KCs 
respond specifically to various and different chemicals (or classes 
of chemicals), proper wiring of connections and selection/tuning 
of their strengths can generate high-level, invariant and "mean- 
ingful" representations. For example, a hypothetical downstream 
neuron responding only to odors associated with locust foods 



could easily be constructed by connecting onto it only KCs firing 
in response to various 5- and 6-carbon chained alcohols, alde- 
hydes, and esters which are common odorants in grassy plants 
(cheerfully nicknamed "green odors"; Hopkins and Young, 1990; 
Bernays and Chapman, 1994). Similarly, some downstream neu- 
rons can respond to odors indicating plant toxicity (for examples 
of such chemical cues see Cottee et al, 1988). 

At this downstream stage, learning (i.e., tweaking of incom- 
ing synapses from KCs) can shape and tune these represen- 
tations, molding them to the animal's specific surroundings. 
Locusts, as many other generalist animals, readily adapt their 
food-preferences to seasonal- and regional-variation of plants, 
their nutritional value and the animal's needs (see Cooper- 
Driver et al., 1977; Bernays et al., 1992; Bernays and Chapman, 
1994), and learning plays an important role in this (Dukas and 
Bernays, 2000; Behmer et al., 2005). It is likely, that learn- 
ing a different food-preference is accomplished by changes 
at KC-fi-lobe synapses, based on positive- and/or negative- 
reinforcement signals — originating, for example, from the diges- 
tive system (Behmer et al., 1999) and relayed via neuromodula- 
tory reward/punishment signals, as shown in a variety of insect 
species (Hammer and Menzel, 1995, 1998; Schwaerzel et al, 2003; 
Unoki et al, 2005). 

Learning can thus sculpt and tune higher neuronal represen- 
tations, bringing neurons downstream of the mushroom body 
to respond to "meaningful" stimuli; i.e., stimuli with ecological 
importance for the animal (Barlow, 1972, 1985). Wiring each of 
these (3-lobe neurons further, to directly trigger a relevant motor- 
program (e.g., for eating, avoidance, escape, etc.) would close the 
loop from perception to action. This would result in a simple neu- 
ral system which receives complex, high-dimensional and noisy 
input and produces reliable animal behavior in response to it — 
in other words, a simple brain that works — and where we are 
approaching a deeper mechanistic understanding of the process. 

SWITCHING BETWEEN CODING SCHEMES 

A large body of work is focused on sparse codes, pointing out 
their many benefits (e.g., Barlow, 1972; Palm, 1980; Baum et al., 
1988; Kanerva, 1988, 1993; Foldiak, 1990, 2002). Sparse codes are 
attractively easy to read out, as few spikes from few neurons trans- 
late to meaning, eliminating the need to integrate over an entire 
population or over a long time (Foldiak, 1990, 2002). Forming 
associations is easy, as learning has to act on few nodes; in the 
theoretical limit-case (one-neuron-per-percept) tuning a single 
synapse suffices to (asymmetrically) link two percepts (Palm, 
1980; Baum et al, 1988; Kanerva, 1993). Complex, meaningful 
ideas can be constructed by wiring-together basic random per- 
cepts (see Section "Kenyon Cells Can Serve as Building Blocks 
for Meaningful (and Plastic) Representations at their Targets"). 
Finally, sparse codes are metabolically economical, although an 
energetic trade-off exists between firing few spikes and main- 
taining many cells (Levy and Baxter, 1996; Attwell and Laughlin, 
2001). Sparse codes are thus attractive and economical substrates 
for computation. 

On the other hand, sparse coding has several serious draw- 
backs. It is wasteful in hardware, as each neuron participates 
only in a small fraction of representations (each percept requires 
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devoted neurons and representations rarely share the same cells). 
It is also extremely sensitive to neuronal damage, as losing neu- 
rons results in loss of precepts or memories (Foldiak, 2002). 

Sparse codes thus seem unlikely candidates for applications 
such as long-term memory storage, but they are very well suited 
for applications such as short-term memory formation and asso- 
ciative learning. I find it attractive to envision the brain as 
functioning by transitioning back and forth between sparse and 
distributed coding schemes across different regions, according 
to the computations needed (Baum et al, 1988; Foldiak, 1990, 
2002). The design principles emerging from the present study 
suggest a neural algorithm, or a recipe, of how such transition 
may be (biologically and algorithmically) accomplished. 

REGAINING COMPLEXITY: RE-EXAMINING THE MODEL'S INITIAL 
ASSUMPTIONS 

The model I presented here relies on several rather crude 
approximations and assumptions (set forth in Section "Model 
Assumptions"). I now re-examine them in light of experimen- 
tal data, and wherever they deviate from biological reality, try to 
assess how the model's predictions are affected. In other words, 
it's time to make things complicated again. 

My first assumption was using discrete time windows, dur- 
ing which PN-spiking is treated as binary (firing or not). The 
locust olfactory system operates with an internal 20-Hz clock 
imposed on it (Laurent and Davidowitz, 1994; Laurent and 
Naraghi, 1994; Perez-Orive et al., 2002, 2004); PNs rarely fire 
more than once per 50 ms cycle (Perez-Orive et al., 2002, 2004), 
with odor-evoked spikes confined to the 25 ms rising-phase of the 
local field-potential oscillation (Laurent and Davidowitz, 1994; 
Laurent et al., 1996; Wehr and Laurent, 1996). The assumption 
is thus justified, at least for odor conditions. During baseline the 
coherence of the PN population is much reduced (as reflected 
by local-field potentials), yet most PNs retain a 20-Hz oscilla- 
tory component, as spike-autocorrelations show (Jortner et al., 
2007). On average, the minimal PN-inter-spike-interval during 
baseline is ~22 ms; so there are definitely sometimes two spikes 
per arbitrary 50-ms window, although rarely more than two 
(Jortner, 2009). This deviation from model assumptions could 
increase the number of EPSPs summing per time window in a 
KC during baseline, and reduce the number of inputs needed 
for threshold-crossing. Having said this, the proportion of spikes 
with inter-spike-intervals below 50 ms is small, and as PN-KC 
synapses show no homo-synaptic facilitation on these time scales 
(Jortner et al., 2007), biases resulting from two spikes per window 
are expected to be small and linear (at most) with spike number. 

A second assumption was i.i.d. spiking across different PNs 
over the course of the integration time — in other words, that 
PN activation-patterns are entirely random. This assumption 
simplified calculations and allowed applying the CLT to the sum- 
mation of inputs. In reality, however, not all antennal-lobe states 
are equally probable given that the animal operates in a natural 
olfactory environment; in fact each individual locust is likely to 
experience in its lifetime only a miniscule fraction of the enor- 
mous number of possible PN activation patterns. Furthermore, 
during odor presentation PNs are affected by common excita- 
tory input from ORNs and common inhibitory input from local 



interneurons, making randomness and mutual independence 
even less likely. In a nutshell, caveats of dependences and corre- 
lations between PNs may bias my analyses. 

Several points support the model's conclusions despite this 
potential bias. First, simultaneous recordings show that spikes 
from different PNs are not correlated over short time scales at 
baseline (Jortner et al, 2007); this does not establish mutual inde- 
pendence but takes a step in that direction. Second, no direct 
synaptic connections were ever found between PNs (Jortner 
and Laurent, in preparation), which eliminates causality from 
contributing to statistical dependence. Finally, the classical CLT 
was extended for cases with dependences between variables 
(Bernstein, 1922, 1927), yielding modern versions of the theo- 
rem which hold under various mutual dependences and correla- 
tions (e.g., French and Wilson, 1978; Wilson, 1981; Reichert and 
Schilling, 1985; Pinske et al, 2007). The deviation from i.i.d. firing 
statistics needs, however, to be borne in mind. 

As a third assumption, all PN-KC synaptic connections were 
treated as equal in strength. Experiments show PN-KC-EPSP 
amplitudes are distributed narrowly, but they are by no means 
uniform (86±44(xV, with half of them within 60-110u,V; 
Jortner et al, 2007). Can ignoring the weights' distribution be 
justified? All calculations throughout this study were based on 
summation of rows of the connectivity matrix W . If the number 
of summed elements n is large enough, the CLT justifies treating 
them as uniform (assuming i.i.d. between connections and finite 
variance, which is reasonable). This holds for "large enough" 
n, but is the length of a connectivity vector, or the number of 
EPSPs summating in a target neuron "large enough"? While the 
CLT strictly applies only when n approaches infinity, it in fact 
converges to Normality very fast as n starts to increase, then slows 
down asymptotically (established by the Berry-Esseen theorem: 
the difference between any CDF with finite variance and the 
Normal CDF decreases as /^/^ ; Feller, 1972). The number of 
summated connections in the model is on the order of N ■ c for 
connection vectors, and on the order of N ■ c ■ p for aggregate 
inputs — so for large N (800, in our case) the assumption is 
justified for all but very small values of c or of c ■ p (with locust 
parameters, N ■ c lies within the hundreds, and N ■ c ■ p is on the 
order of tens). 

The last point has also been addressed directly with simula- 
tions in which sets of EPSPs from the experimental amplitude 
distribution were randomly drawn and summed (Jortner et al., 
2007); for numbers >50, a Gaussian hypothesis for the sum could 
no longer be rejected, and differences between the actual sum 
and its estimate assuming uniformity were minute (Jortner et al., 
2007, Supplementary Material). In conclusion, the model results 
are only minimally biased by the assumption of uniform connec- 
tion strengths because the numbers are sufficiently high; this must 
be reexamined, however, if applying this framework to different 
systems where parameter values may be lower. 

Fourth and last, PN-KC connectivity was treated as random. 
Previous work showed no obvious pattern in the pairs tested 
positive for connections; in fact, most KCs tested simultane- 
ously with several PNs were found to be connected to about 
half of them (Jortner et al, 2007). Anatomically, the mush- 
room body calyx shows no simple patterning (e.g., layered- or 
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columnar-organization) for either PN axons or KC dendrites 
(Farivar, 2005). Thus, no data has suggested patterning in the 
connectivity matrix. This having been said, it is very difficult 
(in fact, impossible) to establish true randomness in experimen- 
tal data, and some patterns may have evaded my analysis. Even 
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suggested by the data, any component 

of randomness in the connectivity matrix would still yield a 
combinatorial explosion of wiring possibilities — so very dramatic 
connection biases would be required to alter the conclusions of 
this study. 

RELATED MODELS AND ALTERNATIVE DESIGNS 

In this study I took the gross structure of the locust olfactory sys- 
tem as starting point and basis for exploration; I did not explore 
all possible architectures or parameters, and by no means claim 
to offer the only solution for constructing specific representations 
from noisy input. A number of theoretical studies have tackled 
similar problems using different approaches, and have come up 
with a variety of designs. Here I briefly survey some of these mod- 
els and their key properties, comparing and contrasting them with 
mine. 

One example is Kanerva's (1988) Sparse Distributed Memory 
model. Its central idea — that memories can be represented as 
binary vectors in a high-dimensional space — stems from a key 
property of such spaces: points in them tend to differ from 
each other — and from most of the remaining space — along many 
dimensions. With this inherent sparseness in mind, a hyper- 
sphere is drawn around each point of interest (memory), and the 
memory is activated whenever an input vector falls within the 
hyper-sphere's boundaries; this grants the model noise-tolerance 
and flexibility more characteristic of brains than of most com- 
puters. As different hyper-spheres may partially overlap, input 
vectors often activate multiple memories. While performance 
depends on dimensionality, number of memories stored and 
activation radii, the model's main results are the feasibility and 
robustness of sparse-distributed storage and retrieval. 

The Sparse Distributed Memory model is fully connected, 
meaning that when classifying an input, its values along all 
dimensions are taken into account; zeros as well as ones. This 
conveys the model its robustness, capacity, and noise tolerance. 
The threshold (the radius of the hyper-sphere) can be adapted if 
needed: for example, to ensure that state-space is tiled, or that 
each output responds with a particular probability. The high- 
dimensional space is thus filled with many partially overlapping 
hyper-spheres of the same dimension as the space; each represents 
one memory and its noise-tolerant envelope. 

At another end of the spectrum of connectivity values is 
Jaeckel's (1989) Selected-Coordinate Design. In this model inputs 
also reside in a binary, high-dimensional space, yet each out- 
put samples just a handful of inputs (10 of 1000; corresponding 
to connection probability 0.01). For a memory to be activated, 
all of its sampled inputs (or selected coordinated) must take 
particular binary values; the rest of the inputs do not matter. 
Jaeckel's model thus attains its noise tolerance via invariance to 
most of the input's dimensions: it only takes into account 10 



and ignores the rest. The threshold is thus fixed, and is equal to 
the number of selected coordinates (they all need to be active). 
In the Selected-Coordinate Design the high-dimensional space 
is thus inhabited by subspaces of lower dimensionality, each 
corresponding to a memory. To think in three dimensions, if 
input space were a cube, memories would be faces (squares) of 
this cube. 

To compare my model with these, it is useful to speak a com- 
mon language. In the locust, input space has 800 dimensions 
(one for each PN), so it is also high-dimensional and binary (as I 
treat each PN as spiking or not within each time window); PN- 
KC connectivity vectors correspond to points of interest in this 
space. The interesting properties of my model rely precisely on the 
inherent sparseness of high-dimensional binary spaces as formu- 
lated by Kanerva: PN-KC connectivity vectors populate a space 
so vast (containing roughly 10 240 potential points) that the actual 
5 x 10 4 points realized tend to populate it extremely sparsely, 
each sitting on average very far from all others. 

What does the portion of space which KCs respond to look like 
in my model, and how does their threshold affect it? In Kanerva's 
model each KC samples all the dimensions and is rather tolerant 
to errors in any of them (via the threshold); in Jaeckel's model it 
samples only very few dimensions, but is very strict about per- 
fectly matching these. For comparison, in my model each KC 
samples half of the dimensions (corresponding to c = Vi) and is 
invariant to the rest. This means that in 800-dimensional space, 
around half of the dimensions — those PNs which the KC is con- 
nected to — are treated as spherical, with a threshold, and the 
others are ignored, thus treated as cubical. The receptive range 
of a KC will thus be an 800-dimensional hyper-cylinder: spheri- 
cal along some dimensions and invariant to the others. Adapting 
the threshold will only affect the spherical dimensions: the larger 
the radius, the lower the threshold, and thus the more states of 
the PN population activate the KC. The model suggested here 
thus combines the high-dimensionality and dense connectivity 
of Sparse Distributed Memory, the invariance to non-connected 
inputs from the Selected-Coordinate Design, and elements of 
noise-tolerance from both. 

WHERE ELSE MAY THESE PRINCIPLES APPLY? 

Neuronal specificity and sparse coding are widespread phenom- 
ena; relevant way beyond olfaction or sensory systems. The 
design principles discussed here are general in nature, relying 
on general assumptions and independent of particulars of the 
system. It is attractive to hypothesize that they may apply in 
a variety of other interconnected systems. While detailed data 
on network architecture — especially connection probabilities — 
is unfortunately still scarce for most biological networks, I point 
out several candidates which merit comparison. 

What happens in other olfactory systems? In Drosophila, KCs 
are concentration invariant and much more specific than PNs 
(Turner et al, 2008; Honegger et al, 2011). KCs each seem to 
receive connections from around 10 PNs (corresponding to con- 
nectivity of ~5%), and have high firing thresholds (Turner et al., 
2008). Differences in design and coding between locusts and 
flies may relate to their ecology: fruit flies occupy highly spe- 
cialized ecological niches whereas locusts are generalist feeders. 
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KC numbers also differ greatly across these species (50,000 in 
locust vs. 2500 in Drosophila); this could merely reflect size con- 
straints, but may also relate to the extent of odor space the 
mushroom body needs to tile, or to the resolution required at 
different regions of the space. 

Mammalian pyriform (olfactory) cortex shows similarities 
with the mushroom body: pyriform pyramidal neurons respond 
to odors with few spikes locked to respiratory oscillations and 
have low baseline firing rates (Poo and Isaacson, 2009). Pyriform 
cortex shows no evidence for spatial organization by odor tuning 
(Illig and Haberly, 2003; Rennaker et al., 2007; Stettler and Axel, 
2009), and axons from individual mitral cells — its input neu- 
rons, analogous to insect-PNs — project onto it diffusely, without 
apparent spatial preference (Friedrich, 2011; Ghosh et al, 2011; 
Miyamichi et al, 2011; Sosulski et al., 2011). Connection prob- 
abilities between mitral cells and pyriform pyramidal cells are 
unknown; however, as these synapses are strong, coincident input 
from just a few may suffice to elicit spiking (Franks and Isaacson, 
2006). This implies — albeit indirectly — that connection probabil- 
ities from second- to third-order neurons in rodents maybe lower 
than in the locust. The level of sparseness of pyriform pyramidal 
neurons is 3-15% — also considerably lower than in locust KCs 
(Poo and Isaacson, 2009; Stettler and Axel, 2009; Isaacson, 2010). 
It remains to be seen how the various network parameters work 
in concert to yield coding solutions in this system. 

One system often treated as a benchmark for decorrelation 
of representations is the cerebellum. Theoretical work by Marr 
(1969) and Kanerva (1988) suggests that the transformation from 
mossy fibers onto cerebellar granule cells is designed to decor- 
relate input representations and reduces the number of nodes 
learning would have to act on; operations precisely suited for a 
neural architecture such as described here. Measurements, how- 
ever, indicate that convergence ratios of mossy fibers onto granule 
cells are much lower (Chadderton et al, 2004); the architecture 
described here may thus not apply to those neurons. 

A fascinating candidate for comparison is the mammalian 
hippocampus. The ability to build meaningful representation 
from discrete random percepts makes sparse codes attractive for 
memory formation (Palm, 1980; Baum et al., 1988; Kanerva, 
1988, 1993). This is a well-established role of both hippocam- 
pus (Scoville and Milner, 1957; Squire, 1992; Tulving and 
Markowitch, 1998) and mushroom body [reviewed in Heisenberg 
(1998)], and the analogy between the two has been previously 
drawn (Strausfeld et al, 1998). Indeed, similarly to KCs, some 
hippocampal neurons use extremely sparse codes: spiking specif- 
ically and reliably in response to complex, high-level stimuli 
and very rarely at baseline (Kreiman et al., 2000; Barnes et al, 
2003; Quian Quiroga et al., 2005), with a majority silent at any 
given time (Thompson and Best, 1989). Topologically, hippocam- 
pus is largely feed-forward (Andersen et al., 1971; O'Reilly and 
McClelland, 1994; Andersen et al, 2000), and while its cytoarchi- 
tecture is extensively studied with classical anatomical techniques 
(e.g., Amaral and Witter, 1989; Patton and McNaughton, 1995), 
quantitative functional connectivity-data at single-cell resolution 
is just emerging (e.g., Brivanlou et al., 2004). 

O'Reilly and McClelland (1994) provide in-depth theoretical 
analysis of hippocampal circuitry. They modeled feed-forward 
components of the circuit (entorhinal cortex, dentate gyrus, 



and CA3), exploring the effects of network parameters on 
pattern-separation and pattern-completion. Testing three values 
of feed-forward connectivity (equivalent to connection prob- 
ability ~0.0001, 0.02, and 0.1), they indeed find that con- 
trary to their intuition, the lowest connectivity value — which 
they had presumed to outperform the higher ones in pattern- 
separation — actually performed worse. They found performance 
similar between the higher values, suggesting a diminishing- 
returns effect; they did not, however, test higher connectivity- 
values approaching ~0.5. It would be tempting to test whether 
architecture within or among some hippocampal sub-regions (for 
example CA3-CA1) may follow similar design to the locust PN- 
KC circuitry, to maximize input separation as a basis for memory 
formation. 

As the design discussed largely relies on random connectivity, 
it is not inherently suitable for circuits where the input's spatial 
relations must be retained, such as early visual- or auditory-areas. 
It may, however, apply well locally within spatially dependent 
modules, such as cortical columns (Mountcastle, 1997), or in 
higher processing areas, where representations become object- 
based and spatially invariant — such as infero-temporal cortex 
(Gross et al, 1972; Perrett et al, 1982; Fujita et al, 1992; Tanaka, 
1996, 2003). 

Optimal input-space separation may be useful even when 
sparse coding is not the goal: the targets' firing threshold deter- 
mines response probability; if it is low, neurons will respond 
broadly. For example, connectivity Vi and a low firing threshold 
can generate distributed representations from sparse ones. 

Core mechanisms elucidated here may still apply even 
with connectivity somewhat removed from the optimum: large 
enough cell-numbers, intermediate connectivity and some inher- 
ent randomness lead to a combinatorial explosion of wiring 
possibilities. This in turn naturally results in input spaces which 
are (by virtue of their mere size) extremely sparsely populated. 
A central message of this study is that to attain efficient input- 
spread, a suitable source-target connectivity regime is neither 
very dense, nor very sparse, but rather within an intermediate 
range. 

CONCLUDING REMARKS: ORIGINS OF NEURONAL 
SPECIFICITY AND THE PARSING OF THE OLFACTORY WORLD 

Neuronal specificity and sparse neural coding have continu- 
ously attracted attention over several decades of brain research 
(e.g., Attneave, 1954; Marr, 1969, 1970; Willshaw and Longuet- 
Higgins, 1970; Barlow, 1972; Palm, 1980; Baum et al, 1988; 
Kanerva, 1988; Tsodyks and Feigel'Man, 1988; Perez-Vicente 
and Amit, 1989; Foldiak, 1990; Rolls and Tovee, 1995; Vinje 
and Gallant, 2000; Willmore and Tolhurst, 2001; Simoncelli and 
Olshausen, 2001; Hahnloser et al., 2002; Laurent, 2002; Perez- 
Orive et al., 2002; Garcia-Sanchez and Huerta, 2003; DeWeese 
et al, 2003; Olshausen and Field, 2004; Huerta et al, 2004; 
Quian Quiroga et al., 2005; Jortner et al., 2007). One reason 
may be that they highlight a truly fundamental property of 
the brain: the ability to parse the surroundings and to extract 
meaning from them. Indeed, it seems that once a network of 
neurons can — through integration of external sensory inputs and 
a series of computations — bring single target-cells to respond 
differentially and reliably to particular objects, combinations of 
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features or classes of stimuli, a significant part of the way towards 
performing the brain's tasks has already been made. A set of 
such "meaningfully responding" cells constitutes the very inter- 
nal model of the world in the organism's brain — molded to its 
ecologically dictated requirements and reflecting the world as the 
animal views it. For example, characterization of a cell ensem- 
ble which represents a complex percept, such as an Apple, puts 
a handle on what thinking of an Apple is (Barlow, 1972); and 
strengthening a set of connections between this cell ensemble and 
another representing the concept of Cake creates both an associa- 
tive link, and a higher, more complex idea. Mechanistic insights 
into how such representations come into being can open an inti- 
mate window onto the brain's subjective world-view and what 
forms it. 

The system I have analyzed here does not yet offer direct access 
to this level of meaningful representations, but it does highlight 
the principles on the basis of which they can emerge. The princi- 
ples along which the olfactory circuitry between the antennal-lobe 
and mushroom body is designed in the locust are an increase 
in dimensionality between source- and target-populations, feed- 
forward connectivity with probability of Vi, maximizing sepa- 
ration between representations; and a high and adaptive firing 
threshold. This leads to specific, reliable, and sparse represen- 
tations of random olfactory percepts in the mushroom body. 
Specificity is explained by a high enough threshold, only crossed 
when the KC encounters an appropriate input vector (from a set 
of vectors which lie within a particular radius of Hamming dis- 
tances from an "ideal" central vector); very different from vectors 
which drive other KCs. Reliability results from the combination 
of strong convergence of PNs onto KCs (400:1) and cycle-by- 
cycle adjustment of the KC firing threshold (Papadopoulou et al., 
2011). 



In the following relay subsets of KCs are sampled by extrinsic 
P-lobe cells through sparse and strong synapses which are highly 
plastic (Cassenaer and Laurent, 2007); this can be used to build 
and learn meaningful representations for P-lobe neurons, con- 
structed from the sparse discrete percepts randomly assigned to 
KCs (Barlow, 1972; Foldiak, 2002). Cells responding to "mean- 
ingful" stimuli (for example, plants with high protein-content, or 
toxic plants) can directly activate motor programs — causing the 
insect to respond to the stimulus with an appropriate behavior 
(for example foraging or avoidance, respectively). 

The locust olfactory circuitry emerges from this study as 
general-purpose machinery for information processing: a neu- 
ral module which receives highly distributed and noisy inputs, 
spreads them maximally, and creates from them an arbitrar- 
ily sparse and selective set of representations — to be used 
as a substrate for learning, memory formation, categoriza- 
tion/generalization, triggering behavioral programs, and poten- 
tially a variety of other computations. These principles are suited 
to process any input where spatial relations need not be con- 
served, as they depend only to a limited extent on the nature 
of the signals to be processed. These mechanisms are therefore 
potentially of broad applicability and interest; where else they may 
apply remains to be seen. 
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APPENDIX (JORTNER, 2012) 

A1. HAMMING DISTANCE BETWEEN PN-KC CONNECTIVITY 
VECTORS— FULL DERIVATION 

Let U, V be two arbitrary connectivity vectors, or rows of 
the connectivity matrix W (where 1 denotes connection, with 
probability c, and 0 none, with probability 1 — c) between the 
populations S and K. The Hamming distance, H(U , V), 
counts the number of bits different across the two vectors. I derive 
I^H ^ U , V , the average Hamming distance between 17 , V 

over all their possible values: 

H^)L=(x>-^) 

\> = 1 / r/,y 

= (X>f-2lW + V?)\ 
\» = I I uy 

as the average of a sum is the sum of the averages, 



\i = 1 

A' 



u 



1 = 1 

N 



U,V «=1 
N 



= e<^- 2 E<^>^+e<^ 



i = i 



<= i 



! = 1 



Finishing the calculation now requires the expected values of the 
expressions Uf.Vf, UjVi. The following table gives all possible 
values of Uj, L7? and their respective probabilities: 



u, 


u) 


Probability 


1 


1 


c 


0 


0 


(1 -c) 


So the expected value of ([/?)„ = 1 ■ 


c +0 - (1 - c) = c; the 


same holds for (vf) v . 






Here are all possible values of [/,-, Vj, 


UiVi and their respective 


probabilities: 






U, V, 


t/,v, 


Probability 


1 1 


1 


c 2 


1 0 


0 


c- (1 -c) 


0 1 


0 


(1 - c) ■ c 


0 0 


0 


(1 - c) 2 



So the expected value is 
{V i U i ) u , v = l- c 2 + 0- (c- (l-c) + (l-c)- c+(l-c) 2 ) =c 2 
Finishing the calculation: 

(H(L>, V)) |7V =N- c-2N- c 2 +/V- c=2JV- c- (1 - c) 



A2. VARIANCE OF INPUT TO A KC (A)— FULL DERIVATION 

I substitute the mean input to a KC by * , which was already calcu- 
lated (Section "Model Results II: Neuronal Activity and Properties 
of Input to KCs"): 



A 



(var(fc ( ))i = (((k - <fci>s) 2 ) s ). 



*) 2 >.) 



E s j w 'j - * 



V; = i / ;=i 



J2 E - 2 * E + * 2 ) ) 

j= lk = 1 ;' = 1 / 5/ 

E E ( S M w « w * " 2 * E ( s ;>s w 'i + * 2 

i = 1 k = 1 i = 1 



Next, I'll separate the first-term into its non-diagonal (i and 
diagonal (i = j) components and treat them each separately: 



N N 



N 



A = E E ( s j s *)s w v w *+E\ s j}s w 1 
\j=ik=i,#k j=i 

N 

- 2 *E(% W 9 + * 2 

J = 1 

To calculate the expected values of the non-diagonal terms 
Si, Sj, S{Sj, the table below provides all possible values and their 
respective probabilities: 



S, Sj 


SiSj 


Probablity 


1 1 


1 


P 2 


1 0 


0 


p- (1 -P) 


0 1 


0 


(1 -P)- P 


0 0 


0 


(1 - P) 2 


So the expected value is 






( s >' s 4, w = i-p 2 + o- (p- (i 


-p) + (l-p)- p-\ 


" (1"P) 2 )=P 2 


For the diagonal terms: Sj, Sj 


and their respective probabilities: 


Sj Sj 




Probability 


1 1 




P 


0 0 




(1 -P) 



Frontiers in Neuroengineering 



www.frontiersin.org 



January 2013 | Volume 5 | Article 19 | 23 



Jortner 



Network architecture for separating representations 



The expected value of (S 2 >? = 1 • p + 0 • (1 - p) = p, the IIJL JL JL 

same holds for {Sj ) 5 = 1 • p + 0 • (1 - p) = p = ( ( E E l "E^ ' 

Continuing the calculation: 



; = i; = i 



N N N 

•2 



A =^ 2 - E E (w^+p- E( w i 



;= i 



i=l;= 1 



All possible values for (Wj-Wft)^, (W|)y, (W$)y and their -* £ (S,>s ■ W ri - * £ (% ■ W t] + * 2 \ 

respective probabilities: 1 = 1 7 = 1 r ^ t 





W ri W, k 


Probability 


1 1 


1 


c 2 


1 0 


0 


c- (1 -c) 


0 1 


0 


(1 - c) • c 


0 0 


0 


(1 - c) 2 


So the expected value is 






(WjjW 1k ) Wik = 1- e 2 + 0- (c- (1 


- c) + (1 - c) • 


c+(l-c) 2 ) = c 2 






Probability 


1 1 
0 0 




c 

(1 - c) 



nents: 

N N N 



E E ( s >' s ib • + E ( s « >s • WriWij - . . 
i=l;=l,<# (=1 

N N \ 

-*E^s-^.-*E( s ;>s-^ + * 2 



2 



I calculate the expected values of the terms S,S ; -, S 2 , S,- exactly as 
in Appendix A2: 



p 2 • E E w " Wf i + p • E WriW " -p • * E w " - 

! = 1;= 1, i^j != 1 i= 1 

N 



the expected value of = 1 • c + 0 • (1 — c) = c — p • il» ^ WV + * 2 

and the same holds for (Wj^ • = 1 • c + 0 • (1 — c) = c i= 1 ' r ^ t 

There are exactly (N 2 — N) non-diagonal terms, and N diago- 

naltermS ' S0 =P 2 - J2 E { W nW,j) r ^ t +p- £WWi>r/f 

A=p 2 - (N 2 -N) ■ c 2 +p- N- c-W-p-N- c+* 2 



N N N 



I will now substitute back * = N ■ p ■ c (as shown in Section ~P ' * E < w ™} r ^ t ~P • * E ( W 'i)r ^ t + * 2 

"Model Results II: Neuronal Activity and Properties of Input to 1=1 > = 1 

KCs"), and get: 

and because of the condition r / f , the expected value of the term 
A = * 2 - p 2 ■ N ■ c 2 + p ■ N ■ c - 2* 2 + * 2 { W n W tj) r ^t,i^j is identical to that of (WriWti),^ t , both equal to c 2 . 

The rest of the terms are calculated exactly as in A2, so: 

= N ■ p - c- (1 — p- c) =N ■ p- c- (1 — p- c) 



p 2 ■ (N 2 - N) ■ c 2 + p ■ N ■ c 2 - p ■ * • N ■ c - p ■ * • N ■ c + * 2 
A3. COVARIANCE OF THE INPUTS TO TWO KCs— FULL DERIVATION 222 2222 

To calculate the covariance between inputs (or between sub- ~* ~ P ' " ' c + P' N ■ c — * — * +* 
threshold membrane potentials) of two KCs, I substitute the mean = N • c 2 ■ p - (1 — p) 
input to a KC by which was already calculated (Section "Model 

Results II: Neuronal Activity and Properties of Input to KCs"). M DIFFERENCE BETWEEN TWO KC INPUTS (OR SUB-THRESHOLD 
/ (v tr w IKir 11 WU MEMBRANE POTENTIALS) 

(cov(fc r , k t )) r¥=t = (fcr- (fc r >s )(k, - {k,y s )|L, ( Tf „ .. . . . ,. 

j'/^/r^t Here I follow the exact same lines of reasoning as m Appendix 

(n \ / n \\ \ A1-A3. First, I express the difference between KC inputs, second, 

SjWrj — * jl ^ SjW t j — * ) ) ) " s P^' t ^ i nt0 i ts non-diagonal and diagonal components, and third, 

1 = 1 /\;'=i /'s'r^t I average each class of terms separately: 
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D(kr, k t ) S {{(k r - k t )\)^ t = (( ( E S,W ri - £ Sj W (; 

ii=l ;=1 , 



N N \ 2 



S' r j£ t 



X! s - w "- - X! s j w 9 ( E SmWm ~ E SnWtn ) ) ) 

J=l j=l J \m=l n=l 'l-Jr^t 



E SiSnWriWrm) j -((EE S i S " W ri W t> 



N N \ \ I I N N 

£ £ s ; s ffl w (i w rm \ \ +// J2 E s i s " w v w <« 
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N N \ \ I I N 
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N N N 
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- E E ( S M • ( w ^™) r ^ t - E ( s i ) s • < w '; w ^) r * t + • • ■ 

j = 1 m = l.j^z m j = 1 

N N N 

+ E E ( s ; s «)§ • ^ t + E (4 • (w|) r t 

j = \ n=X,j^n j = 1 

= N(N — 1) • p 2 • c 2 + N ■ p- c — N(N — 1) ■ p 2 ■ c 2 — N ■ p- c 2 — . . . 

—N(N — \) ■ p 2 ■ c 2 — N ■ p - c 2 + N(N — \) ■ p 2 ■ c 2 + N ■ p - c 
= 2N- p- c- (1 - c) 

A5. FIRING THRESHOLD WHEN CONNECTIVITY IS 1/2 should be equal to the mean (*) plus the appropriate times the 

A threshold designed to be crossed for a particular fraction of states standard deviation (n/A). 
1 — Q(z), where Q is the CDF of the standard Normal distribution, Assuming c = Vi, we get: 

f(z) = * + z\/A = N- p - c + Zy/N- p - c- (1 -p - c) 

1 N-p [N~V~ / pT N-p + zJN-p- (2-p) 

c=-^m = — (i--) = 
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